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MATHEMATICAL  LIMITS  ON  DIFFERENCES 
BETWEEN  A  POPULATION  AND  A  SUBPOPULATION 


1.  INTRODUCTION 

The  U.S.  Department  of  Energy’s  Chemical-Biological  Nonproliferation  Program 
has  the  task  of  Improving  the  U.S.  capability  to  prepare  for  and  respond  to  the  use  of 
chemical  and  biological  warfare  agents  against  the  civilian  population.  The  Modeling 
Subgroup  of  the  Technology  Development  Program  Area  is  responsible  for  developing 
models  of  atmospheric  transport  and  dispersion  of  chemical  warfare  agents.  To  assess 
casualties,  these  models  require  estimates  of  chemical  warfare  agent  toxicity  to  the 
civilian  population.  However,  the  present  chemical  warfare  agent  toxicity  estimates 
(Grotte  and  Yang  2001)  are  for  male  soldiers.  In  the  absence  of  data  relevant  to  the 
required  soldier-to-dvilian  adjustment,  either  no  adjustment  is  done  or  a  guess  is  made. 
Concern  that  guesses  at  the  required  adjustment  may  exceed  what  is  mathematically 
possible  led  to  this  report.  Because  a  subpopulation  is  part  of  the  population,  there  are 
mathematical  limits  on  how  much  a  subpopulation  can  differ  from  the  population. 
Suppose,  for  example,  that  a  resistant  subpopulation  is  20%  of  the  entire  population. 
The  extreme  placement  for  this  subpopulation  would  be  that  it  is  the  upper  20%  of  the 
population.  Then  the  median  of  this  subpopulation  would  be  at  the  90th  percentile  of  the 
population.  For  a  normal  distribution,  the  90th  percentile  is  1 .28  standard  deviations 
above  the  mean.  For  any  distribution  with  finite  variance,  90%  of  the  population  must  lie 
within  3.16  standard  deviations  of  the  mean  (by  Chebyshev’s  Inequality;  see,  for 
example,  Mood,  Graybill,  and  Boes  1974);  hence,  the  90th  percentile  of  the  population 
(and  the  median  of  the  subpopulation)  cannot  be  farther  away  from  the  mean  than  3.16 
standard  deviations.  Although  the  placement  of  a  subpopulation  into  the  tail  of  the 
population  distribution  is  not  realistic,  it  establishes  the  existence  of  limits  on  the 
difference  between  a  population  and  a  subpopulation.  Further,  it  reveals  three  relevant 
factors:  the  size  of  the  subpopulation,  the  standard  deviation  of  the  population,  and  the 
distribution  of  the  population.  The  next  section  develops  the  theory  and  notation  for  a 
more  reasonable  model. 


2.  THEORY  AND  NOTATION 

The  susceptibility  of  the  population  to  a  toxicant  was  modeled  by  a  lognormal 
distribution  of  effective  doses.  (The  theory  and  methods  are  also  applicable  to 
dosages.)  Toxicologists  characterize  a  lognormal  distribution  by  its  median  effective 
dose  (ED50)  and  its  probit  slope,  m.  The  probit  slope  is  the  reciprocal  of  the  standard 
deviation  of  log  (effective  dose),  where  log  is  the  common  (base  10)  logarithm. 

Individual  susceptibilities  are  given  in  Z  units  of  the  population  by 

Z  =  mpop  [log(ED)  -  log(ED5o)j,  (1) 
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where  mpop  is  the  probit  slope  of  the  population  and  ED  is  the  effective  dose  for  an 
individual.  Thus,  the  population  is  represented  by  a  standard  (mean  zero,  variance  one) 
normal  distribution. 

2.1  Subpopulation  Model. 

The  distribution  of  effective  doses  for  a  subpopulation  was  modeled  as  a 
lognormal  distribution — hence,  the  distribution  of  log  (ED)  for  the  subpopulation  follows 
the  bell-shaped  normal  curve.  Let  p  and  a  be  the  subpopulation  mean  and  the 
subpopulation  standard  deviation  in  Z  units  of  the  population — that  is,  p  and  o  are 
calculated  from  the  effective  doses  of  the  subpopulation  after  transforming  the  effective 
doses  by  (1 ).  Thus, 

M  =  mw  [log(subpopulation  ED50)  -  log(population  ED50)]  (2) 


and 


o  =  o8Ub  /  Opop =  mp0p  /  m8Ub  (3) 

where  o#ub  snd  Opop  are  the  standard  deviations  of  the  subpopulation  and  the 
population,  respectively,  in  log(ED)  units  and  m9Ub  is  the  probit  slope  of  the 
subpopulation.  The  size  of  the  subpopulation,  0,  is  defined  as  a  fraction  of  the 
population.  Figure  1  shows  a  subpopulation  of  size  0  =  0.3.  The  curves  in  Figure  1  are 
not  probability  densities  but  frequencies— normal  curves  fit  to  histograms— as 
described,  for  example,  in  chapter  5  of  Dixon  and  Massey  (1969). 


Figure  1.  Model  for  a  Subpopulation  of  Size  0  *  0.3 


Z  Units 
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2.2  Feasible  Values  for  Subpopulation  Parameters. 


The  combinations  of  p  and  a  that  allow  the  subpopulation  bell  curve  to  remain 
entirely  within  (or  under)  the  population  bell  curve  constitute  the  feasible  region  for  p 
and  cr.  Figure  2  shows  the  feasible  region  (shaded)  for  a  subpopulation  of  size  0  =  0.3. 
Crosier  and  Sommerville  (2002)  determined  feasible  values  of  p  and  a  by  numerical 
searches;  Figure  2,  however,  was  obtained  from 

p  =  ±[-2  (1  -a2)  ln(0/a)]1/2,  (4) 

where  In  is  the  natural  (base  e)  logarithm.  Equation  (4)  gives  the  upper  and  lower  limit 
of  the  feasible  range  for  p  as  a  function  of  0  and  a.  Appendix  A  gives  the  derivation  of 
(4);  the  ranges  of  0  and  a  are  0 .  < 0  so  <  1 .  When  p  is  plotted  on  the  y-axis,  as  in 
Figure  2,  the  feasible  region  is  symmetrical  about  the  x-axis;  p  is  positive  for  a  resistant 
subpopulation  and  negative  for  a  sensitive  subpopulation.  Henceforth,  the  term  feasible 
region  will  be  limited  to  either  the  resistant  subpopulation  case  or  the  sensitive 
subpopulation  case.  It  is  only  necessary  to  study  one  case;  the  results  apply  to  the 


Figure  2.  Feasible  Region  of  a  and  p  for  a  Subpopulation  of  Size  8  -  0.3 


Standard  Deviation 
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other  case  by  symmetry.  Figures  3  (minimum  o),  4  (maximum  a),  and  5  (maximum  p) 
show  subpopulations  that  correspond  to  limits  of  the  feasible  region  for  p  and  a  when 
0  =  0.3.  The  case  shown  in  Figure  3  (minimum  o)  is  highly  unlikely.  To  obtain  such  a 
subpopulation,  there  would  have  to  be  a  strong  selection  bias  for  individuals  with 
effective  doses  near  the  population  EDso. 

Note  that  Figure  4  has  p  =  0  and  o  =  1 ;  these  are  the  expected  values  of  the 
subpopulation  parameters  when  the  subpopulation  is  a  random  sample  of  the 
population.  If  one  considers  the  age,  sex,  health,  and  physical  fitness  status  of  military 
personnel  as  irrelevant  to  their  susceptibility  to  chemical  warfare  agents,  then  military 
personnel  can  be  regarded  as  a  random  sample  of  the  general  population. 


Figure  3.  Subpopulation  of  Size  6  =  0.3  with  Minimum  Standard  Deviation 


Z  Units 


Figure  4.  Subpopulation  of  Size  0  *  0.3  with  Maximum  Standard  Deviation 


Z  Units 
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Figure  5.  Subpopulation  of  Size  0  =  0.3  with  Maximum  Mean 


-4-3-2-101234 
Z  Units 


2.3  Selection  of  Subpopulation  Parameters. 

Given  the  feasible  region  for  a  resistant  or  sensitive  subpopulation  of  size  0,  how 
does  one  select  p  and  a  to  represent  the  subpopulation?  There  are  many  choices, 
which  can  be  categorized  under  two  philosophies:  the  worst-case  combination  of  p  and 
a,  or  typical  values  of  p  and  a. 

For  conversion  of  a  population  ED50  to  a  resistant  subpopulation  ED50,  the 
largest  conversion  factor  is  obtained  by  selecting  the  value  of  a — call  it  a*—  that  yields 
the  maximum  value  for  p,  denoted  px.  Appendix  B  gives  the  mathematical  derivation 
and  numerical  method  used  to  find  ax.  For  conversion  of  a  resistant  subpopulation  ED50 
to  the  population  ED50,  the  conversion  factor  will  be  less  than  one;  the  smallest 
conversion  factor  is  obtained  by  selecting  the  value  of  o  that  maximizes  the  ratio  p  la. 
Denote  the  values  of  p  and  a  of  the  largest  ratio  by  pr  and  or.  Generation  of  pr  and  ar  is 
discussed  in  Appendix  C. 

Two  estimators  of  typical  values  for  p  and  a  are  the  mid-range  and  the  centroid. 
The  mid-range  estimates  are:  pm  =  Px  /  2  and  am  =  (0  + 1 )  /  2.  The  formulas  for  the 
centroid  estimates  pc  and  oc  are  in  the  form  of  an  integral  divided  by  the  area  of  the 
feasible  region,  which  is  also  expressed  as  an  integral.  The  integral  for  the  numerator 
of  pc  has  analytic  solution  (90-6  ln(0)  -  03  -  8)  /  9,  but  the  integrals  for  the  numerator 
of  ac  and  for  the  area  of  the  feasible  region  were  evaluated  numerically. 

Figure  1  is  based  on  the  centroid  values  pc=  0.403  and  ac= 0.645  for  a  resistant 
subpopulation  of  size  0  =  0.3,  whereas  Figure  5  uses  the  maximum-mean  values  pj<= 
0.946  and  ax=  0.633  for  a  resistant  subpopulation  of  size  0  =  0.3.  The  maximum-mean 
case  in  Figure  5  might  be  appropriate  for  describing  the  physical  fitness  of  the  military 


personnel  (because  physical  fitness  is  a  selection  criterion  or  requirement  for  military 
personnel),  but  it  may  be  excessive  for  describing  susceptibility  to  chemical  warfare 
agents. 


3.  RESULTS 

3.1  Subpopulation  Parameters. 

The  table  gives  the  standard  deviations  and  means  for  resistant  subpopulations; 
for  sensitive  subpopulations,  multiply  the  means  by  -1 .  The  units  for  the  standard 
deviations  and  means  in  the  table  are  Z  units  of  the  population. 

3.2  Conversion  of  Toxicity  Estimates. 

Suppose  a  population  has  an  EDso  of  100  and  a  probit  slope  of  5.  To  convert 
these  estimates  to  a  sensitive  subpopulation  of  size  0  =  0.2, 1  use  the  centroid  values 
from  the  table,  pc=  —0.508  and  oc~ 0.591,  which  are  in  Z  units  of  the  population. 
Equation  (2)  may  be  used  to  convert  pc  =  -0.508  to  the  median  effective  dose  for  the 
subpopulation.  Substituting  into  equation  (2)  yields  -0.508  =  5  pog(subpopulation 
EDso)  -  log(100)],  or  subpopulation  ED5 0  =  antilog  [-0.508  /  5  +  log(100)]  =  79 
(rounded).  Alternatively,  one  may  develop  a  conversion  factor  to  convert  the  population 
EDso  to  the  subpopulation  EDso.  The  factor  to  convert  from  effective  dose  A  to  effective 
dose  B  may  be  obtained  from  EDa  and  EDb  expressed  in  Z  units  of  the  population — 
denoted  Za  and  Zb,  respectively — by: 

Conversion  Factor  =  EDb  /  EDA  =  antilog  [  ( ZB  -  ZA )  /  ].  (5) 

Applying  (5)  with  ZA  =  0,  ZB  =  -  0.508,  and  mpop  =  5  yields  0.791,  so  the  EDso  of  the 
subpopulation  is  (100)(0.791)  =  79  (rounded).  To  convert  the  population  probit  slope  to 
a  probit  slope  for  the  subpopulation,  I  substitute  oc  =  0.591  from  the  table  into  equation 
(3):  0.591  =  Osub/Opop  =  m?opl m8U b,  or  maUb-  mpop/0.591.  Thus,  dividing  the  population 
probit  slope  (m pop  =  5)  by  oc  =  0.591  yields  the  subpopulation  probit  slope: 
m8Ub  =  5  /  0.591  =  8  (rounded). 

Conversions  from  a  subpopulation  to  the  population  require  that  the  probit  slope 
be  converted  before  conversion  of  the  EDso  because  the  population  probit  slope  is 
needed  to  convert  the  EDso.  For  example,  suppose  a  resistant  subpopulation  of  size 
0  —  0.3  has  an  EDso  of  200  and  a  probit  slope  of  10.  First  convert  the  probit  slope:  from 
the  table,  oc  =0.645,  and  applying  p  =  (m8Ub)(ac)  gives  m „  =  (10)(0.645)  =  6.45. 
Then  use  mpop  =  6.45  to  convert  the  EDso:  applying  (5)  with  ZA  =  0.403,  ZB  =  0,  and 
mpop  =  6.45  yields  a  conversion  factor  of  0.866.  Thus,  the  population  has  an  EDso  of 
(200)(0.866)  =  173  and  a  probit  slope  of  6  (rounded). 
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Table.  Subpopulation  Parameters 


Subpopulation  Standard 

Deviations 

Resistant 

Subpopulation  Means 

Max 

Max 

Max 

Max 

Size 

Mean 

Ratio  Centroid 

Mean 

Ratio 

Centroid 

(6) 

(ax) 

(Or) 

(Oc) 

(Mx) 

(Mr) 

(Me) 

0.001 

0.285 

0.002 

0.446 

3.223 

1.000 

1.408 

0.002 

0.301 

0.003 

0.450 

3.020 

1.000 

1.318 

0.003 

0.312 

0.005 

0.452 

2.896 

1.000 

1.263 

0.004 

0.320 

0.007 

0.455 

2.805 

1.000 

1.222 

0.005 

0.327 

0.008 

0.456 

2.733 

1.000 

1.190 

0.006 

0.333 

0.010 

0.458 

2.672 

1.000 

1.164 

0.007 

0.338 

0.012 

0.460 

2.621 

1.000 

1.141 

0.008 

0.343 

0.013 

0.461 

2.575 

1.000 

1.120 

0.009 

0.347 

0.015 

0.463 

2.535 

1.000 

1.102 

0.010 

0.351 

0.016 

0.464 

2.498 

1.000 

1.086 

0.020 

0.381 

0.033 

0.475 

2.245 

0.999 

0.973 

0.030 

0.402 

0.049 

0.484 

2.086 

0.998 

0.902 

0.040 

0.419 

0.066 

0.492 

1.968 

0.996 

0.850 

0.050 

0.434 

0.082 

0.500 

1.873 

0.993 

0.807 

0.060 

0.447 

0.098 

0.507 

1.793 

0.990 

0.772 

0.070 

0.458 

0.115 

0.514 

1.723 

0.987 

0.741 

0.080 

0.469 

0.131 

0.520 

1.661 

0.983 

0.714 

0.090 

0.480 

0.147 

0.527 

1.605 

0.978 

0.689 

0.100 

0.489 

0.163 

0.533 

1.554 

0.974 

0.667 

0.110 

0.499 

0.178 

0.539 

1.507 

0.968 

0.646 

0.120 

0.507 

0.194 

0.545 

1.463 

0.962 

0.627 

0.130 

0.516 

0.210 

0.551 

1.422 

0.956 

0.609 

0.140 

0.524 

0.225 

0.557 

1.384 

0.949 

0.592 

0.150 

0.532 

0.240 

0.563 

1.347 

0.942 

0.576 

0.160 

0.540 

0.255 

0.569 

1.313 

0.935 

0.561 

0.170 

0.547 

0.270 

0.575 

1.280 

0.927 

0.547 

0.180 

0.555 

0.285 

0.580 

1.248 

0.919 

0.533 

0.190 

0.562 

0.300 

0.586 

1.218 

0.910 

0.520 

0.200 

0.569 

0.314 

0.591 

1.189 

0.901 

0.508 

0.250 

0.602 

0.383 

0.619 

1.059 

0.853 

0.451 

0.300 

0.633 

0.447 

0.645 

0.946 

0.800 

0.403 

0.350 

0.663 

0.507 

0.672 

0.846 

0.743 

0.360 

0.400 

0.691 

0.563 

0.698 

0.756 

0.683 

0.321 

0.450 

0.719 

0.614 

0.723 

0.673 

0.623 

0.286 

0.500 

0.746 

0.662 

0.749 

0.596 

0.562 

0.253 

0.550 

0.772 

0.707 

0.774 

0.523 

0.501 

0.222 

0.600 

0.798 

0.748 

0.799 

0.455 

0.441 

0.193 

0.650 

0.824 

0.787 

0.825 

0.390 

0.381 

0.166 

0.700 

0.849 

0.823 

0.850 

0.328 

0.323 

0.139 

0.750 

0.875 

0.857 

0.875 

0.269 

0.266 

0.114 

0.800 

0.900 

0.889 

0.900 

0.212 

0.210 

0.090 

0.850 

0.925 

0.919 

0.925 

0.156 

0.156 

0.066 

0.900 

0.950 

0.947 

0.950 

0.103 

0.103 

0.044 

0.950 

0.975 

0.974 

0.975 

0.051 

0.051 

0.021 

"  For  sensitive  subpopulations,  multiply  the  means  by  - 1 . 
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3.3  Comparison  to  Blood  Lead  Data. 


Blood  lead  data  from  the  second  National  Health  And  Nutrition  Examination 
Survey  were  analyzed  in  terms  of  subpopulation  parameters  and  compared  to  the 
subpopulation  model.  A  report  by  the  National  Center  for  Health  Statistics,  Annest,  and 
Mahaffey  (1984)  gives  geometric  mean  blood  lead  levels,  geometric  standard 
deviations,  and  population  size  for  the  target  population  (non-institutionalized  civilians 
aged  6  months  to  74  years)  and  for  134  subpopulations  defined  by  one  or  more  of  the 
factors  race,  age,  sex,  income,  and  type  of  residence  (central  city,  urban,  rural).  The 
geometric  standard  deviations  were  adjusted  for  analytical  error  before  calculation  of 
the  subpopulation  statistics  (an  estimate  of  analytical  error  was  given  in  Appendix  II  of 
the  report).  Although  blood  lead  levels  for  a  homogeneous  group  follow  a  lognormal 
distribution  (Hasselblad,  Stead,  and  Galke  1980),  there  is  no  reason  to  believe  that 
blood  lead  levels  for  the  target  population— or  for  the  subpopulations  defined  by  the 
factors  above— will  do  so.  However,  comparing  the  subpopulation  model  to  data  that 
meet  the  model  assumptions  would  seem  pointless— such  data  cannot  violate  the 
derived  mathematical  limits.  Figure  6  plots  the  absolute  deviations  of  the  subpopulation 
means  from  the  population  mean  versus  the  subpopulation  size.  For  comparison  to  the 
subpopulation  model,  the  predicted  maximum  mean  ( px)  is  plotted  as  a  triangle,  and 
the  centroid  estimate  of  the  mean  ( pc )  is  plotted  as  a  circle,  in  Figure  6.  Figure  7  plots 
the  subpopulation  standard  deviations  versus  subpopulation  size.  There  is  a  trend 


Figure  6.  Subpopulation  Mean  Versus  Subpopulation  Size 


Blood  Lead  Data  From  NHANES  II  (Solid  Squares) 
Centroid  (Circles)  and  Maximum  (Triangles) 
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Figure  7.  Subpopulation  Standard  Deviation  Versus  Subpopulation  Size 


Blood  Lead  Data  From  NHANES  II 


Subpopulation  Size  (%) 


toward  smaller  standard  deviations  for  smaller  subpopulations,  but  the  subpopulation 
standard  deviations  are  far  from  their  lower  bounds.  The  subpopulation  model  limits  the 
range  of  a  to  0sa  <  1,  but  17  subpopulations  have  estimated  standard  deviations 
greater  than  one.  Of  these  17  subpopulations,  nine  have  95%,  two-sided  confidence 
intervals  for  a  that  overlap  one.  Thus,  the  standard  deviations  greater  than  one  appear 
to  be  partly  due  to  random  variation  and  partly  due  to  the  failure  of  the  normality 
assumption  (for  logarithms  of  the  data). 


4.  DISCUSSION 

4.1  Mixed  Distributions  and  Gender. 

The  subpopulation  model  uses  a  lognormal  distribution  for  the  susceptibility  of 
individuals  in  the  population.  Although  the  model  includes  the  case  0  =  0.5,  the  model 
is  not  intended  to  represent  sex  differences.  Sex  differences  may  be  better  modeled  as 
two  lognormal  distributions — one  for  each  sex.  When  combined,  the  two  lognormal 
distributions  may  produce  a  mixed  distribution  rather  than  a  lognormal  distribution.  A 
mixed  distribution  due  to  gender  effects  can  be  analyzed  by  applying  the  subpopulation 
model  to  each  gender  separately.  The  subpopulation  model  is  intended  to  represent 
subpopulations  created  by  selection  (non-random  sampling);  it  is  not  intended  to 
represent  subpopulations  that  have  a  known  biological  difference  from  the  rest  of  the 
population. 
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4.2  SubPQDulation  Size  Estimation 


For  demographically  defined  subpopulations,  estimation  of  subpopulation  size  is 
straightforward— for  example,  individuals  aged  65  years  or  older  constituted  12.4%  of 
the  U.S.  population  in  2000  (U.S.  Census  Bureau  2001a).  However,  this  simple  method 
is  not  appropriate  for  quantifying  the  size  of  the  military  subpopulation.  Currently,  the 
percentage  of  the  population  serving  in  the  military  is  small;  however,  I  consider  ’ 
individuals  in  the  military  to  be  randomly  selected  from  a  subpopulation  of  healthy, 
physically  fit,  young  adults.  Therefore,  an  estimate  of  the  size  of  this  subpopulation  is 
needed.  To  obtain  such  an  estimate  Crosier  and  Sommerville  (2002)  examined 
historical  military  demographics. 

U.S.  military  strength  reached  its  peak  of  12.1  million  in  1945  (Dunnigan  and  Nofi 
1994).  The  population  of  the  United  States  in  1945  was  140  million  (U.S.  Census 
Bureau  2000).  Therefore,  about  8.6%  of  the  U.S.  population  in  1945  was  in  the  military. 
However,  military  personnel  in  World  War  II  were  nearly  all  men,  so  about  17%  of  the 
male  population  was  in  the  military  in  1945.  Besides  the  men  in  the  World  War  II 
military,  there  were  other  men  who  were  qualified  for  military  service  but  did  not  serve. 
Thus,  0  =  0.17  is  a  lower  bound  for  the  size  of  the  subpopulation  from  which  military 
personnel  are  selected. 

U.S.  men  from  18  to  45  years  old  were  liable  for  service  in  World  War  II 
(Selective  Service  2001).  This  age  group  comprised  42%  of  the  male  population  in 

1999  (US  Census  Bureau  2001b).  However,  not  every  man  in  this  age  range  is  fit  for 
military  service.  Thus,  0  =  0.42  is  an  upper  bound.  A  reasonable  estimate  would  be 

0  =(0.17  +  0.42)/ 2,  or  0  =  0.30.  Because  resistance  to  chemical  warfare  agents  is  not 
necessarily  a  function  of  physical  size  and  strength,  female  soldiers  can  be  regarded  as 
randomly  drawn  from  a  resistant  female  subpopulation  that  is  30%  of  the  total  female 
population.  The  estimate  0  =  0.30  applies  to  the  subpopulation  from  which  military 
personnel  are  drawn.  There  are  other  resistant  subpopulations,  such  as  the  working 
population.  In  2000,  the  U.S.  workforce  was  49%  of  U.S.  population  (U.S.  Census 
Bureau  2001a,  2001c).  As  in  the  case  of  the  armed  forces,  the  actual  size  of  the 
workforce  is  an  underestimate  because  many  individuals  who  are  capable  of  workinq 
are  not  in  the  workforce. 

For  toxicologists,  the  laboratory  animal  is  a  subpopulation  of  interest.  An 
animal’s  health  and  susceptibility  to  toxicants  varies  over  the  animal's  lifetime.  The  use 
of  young,  adult  animals  in  toxicological  studies  creates  a  selection  bias  in  the  results 
that  is  unrelated  to  animal-to-human  scaling.  To  quantify  the  magnitude  of  this  selection 
bias,  note  that  14%  of  U.S.  population  was  in  the  age  range  15-24  years,  inclusive,  in 

2000  (U.S.  Census  Bureau  2001a).  The  age  range  of  animals  in  a  toxicological  study 
may  be  very  narrow,  but  the  animals  are  similar  to  other  young  adult  animals,  so 

0  —  0.15  is  a  reasonable  estimate  for  the  size  of  the  subpopulation  of  young,  adult 
laboratory  animals. 
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APPENDIX  A 


FEASIBLE  REGION 

The  equation  for  a  normal  curve  fit  to  a  histogram  has  a  constant  factor  of 
N  w  I  (2tt)(1/2),  where  N  is  the  number  of  data  points  and  w  is  the  width  of  the  class 
intervals  used  to  construct  the  histogram.  For  convenience,  and  without  loss  of 
generality,  I  omit  the  constant  in  the  following  derivations.  The  bell  curve  for  the 
population  has  the  equation 

ypop  ( z)  =  exp(-  z2  /  2),  (A1 ) 

which  has  an  area  under  the  curve  of  (2tt)(1/2)  and  a  maximum  height  of  one  at  z  =  0. 
The  bell  curve  for  a  subpopulation  of  size  0  =  A/8Ub  /  A/pop,  0  <  0  £  1 ,  has  the  equation 

y.ub( z)  =  (0/a)  exp[-  ( z-  p  ) 2 1 2a2],  (A2) 

which  has  an  area  under  the  curve  of  0  (2tt)(1/2)  and  a  maximum  height  of  0/a  at  z  =  p. 

For  fixed  0  and  a,  there  is  a  feasible  range  over  which  p  can  vary  without 
violating  the  condition  that  the  subpopulation  bell  curve  lie  underneath  the  bell  curve  of 
the  population.  When  p  attains  a  limit  of  its  feasible  range  for  fixed  0  and  a,  the 
subpopulation  bell  curve  touches  the  population  bell  curve  at  the  contact  point.  Denote 
the  Z  coordinate  of  the  contact  point  zb.  At  the  contact  point,  the  heights  of  the  two  bell 


curves  are  equal,  ypoP(  zb)  =  y»Ub(  2b),  or 

exp(-  zb2  /  2)  =  (0/a)exp[-(zb-p)2/2o2].  (A3) 

Also,  at  the  contact  point,  the  derivatives  ( dy/dz )  of  the  two  curves  are  equal: 

exp(-zb2/2)(-zb)  =  (0/a)exp[-(zb-p)2/2o2l(-1/2a2)2(zb-p)  (A4) 

Combining  (A3)  and  (A4)  yields 

-zb  =  (-1/2a2)2  (zb-p ).  (A5) 

Multiplying  both  sides  of  (A5)  by  a2  and  simplifying  yields 

-zbOz  =  -zb  +  p.  (A6) 


Adding  zb  to  both  sides  of  (A6),  factoring  zb-zba2tozb(1-o2),  and  solving  for  zb 
yields 


zb=  p/(1-o2), 


(A7) 
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where,  to  avoid  division  by  zero,  o  =  1  is  not  allowed.  Substituting  p  /(I 
(A3)  yields 


-o2)  for  zb  in 


exp{-[p/(1  -a2)]2/ 2}=  (0/o) exp{- [p /(I  -o2 )-M]2/ 2a2}. 

Taking  natural  logarithms  of  both  sides  of  (A8)  gives 

-lM/(1  -a2)]2/ 2  =  ln(0/o) - £m /(I  -o2)-M]2/2a2. 
Multiplying  out  the  squares  within  brackets  gives 

“Im2/(1  -a2)2]/ 2  =  ln(0/a)-[y2/(1-o2)2-2p2/(1  -o2)  +  p2]/2a2 
Moving  the  second  term  of  the  right  side  to  the  left  side  and  factoring  out  p2  gives 


(A8) 

(A9) 


(A10) 


^{-[1/(1  -cr2)2]/2  +  [1/(1  -  o2)2-2/(1  -cr2 )  +  1  ]  / 2 a 2 }  =  ln(0/a)  (All 

The  expression  within  braces,  0.  can  be  simplified.  Start  by  multiplying  and  dividinq  bv 
the  factors  to  obtain  v  3 


-1/2(1  -a2)2  +  1/2o2(1  -  a2)2  -  2/2az(1  -  a2)  +  1/2a2.  (A12) 

Placing  all  the  terms  of  (A12)  on  a  common  denominator  gives 

t-o2  +  1  -2(1  -  a2 )+  (1  -  a2)2]  /  2o2(1  -  a2)2.  (A13) 

The  first  two  terms,  -o2  + 1,  are  1  -  a2,  which  makes  1  -o2  a  factor  common  to  the 
numerator  and  the  denominator;  canceling  out  the  common  factor  yields 


[1  -2+(1-o2)]/2o2(1-a2). 


(A14) 


The  numerator  simplifies  to  -  a2,  which  cancels  the  o2  in  the  denominator;  so  -1  /2 
(l-o  )  is  the  expression  within  the  braces  of  (A1 1 ).  Hence  (A1 1 )  becomes 


M2 {—1/2  (1  -o2  )}  =  ln(0/o). 
Multiplying  both  sides  of  (A15)  by  -2  (1  -  o2 )  yields 

M2  =  -2(1  -o2)ln(0/o). 
The  square  root  of  (A16)  is  equation  (4)  of  the  text. 


(A15) 


(A16) 


APPENDIX  A 
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APPENDIX  B 


MAXIMUM  SUBPOPULATION  MEAN 

Because  the  equation 

|j  =  ±  [-2  (1  -ct2)  ln(0/a ) ] 1/2  (B1) 

gives  the  maximum  feasible  p  as  a  function  of  0  and  a,  it  is  possible  to  obtain,  for  fixed 
0,  the  a  at  which  the  maximum  feasible  p  occurs  by  setting  the  derivative  of  (B1)  with 
respect  to  a  equal  to  zero,  and  solving  for  ct.  Monotonic  transformations  are  often  used 


to  simplify  this  process:  if  p  attains  its  maximum  at  o  =  or*  then  any  monotonic 
transformation  of  p  will  have  a  maximum  at  o  =  ox.  Thus,  squaring  (B1)  gives 

p2  =  — 2(1— ct2)  ln(0  /  o ).  (B2) 

Dividing  by  2,  noting  that  -1  ( 1  -  o2 )  =  ( o2  - 1 ),  and  separating  ln(0/o)  gives 

p2  /  2  =  ( o  2  -  1 )  [  ln(0 )  -  ln(  o )  ].  (B3) 

or 

p2  /  2  =  o  2  ln(0 )  -  o  2  in(  o )  -  ln(0 )  +  ln(  o ).  (B4) 

Taking  the  derivative  of  (B4)  with  respect  to  ct  gives 

d(p2/2)/dCT  =  2CTin(0)-CT2/CT-2CTln(CT)  +  1  la  (B5) 

Setting  (B5)  equal  to  zero  fixes  the  value  of  a  at  crx: 

2  oxln(0)  -ox-2  CTxln(ox)  +  1  /ox=  0  (B6) 

Multiplying  by  -1/2  ox  gives 

-ln(0)  +  1/2  +ln(CTx)-1/(2crx2)  =  0,  (B7) 


which  is  not  readily  solved  for  ox  as  a  function  of  0.  Thus,  for  fixed  0, 1  used  a  binary 
search  (also  known  as  bisection)  to  bracket  ox.  The  initial  bounds  for  ox  were  0  and  1 ; 
iteration  continued  until  the  difference  between  the  lower  bound  and  the  upper  bound 
was  less  than  10-6.  Then  (B1)  was  used  to  obtain  pxfrom  ox. 
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APPENDIX  C 


EXTREME  SUBPOPULATION-TO-POPULATION  CONVERSION 

When  estimating  the  population  EDsjfrom  a  subpopulation  ED50  and  the 
subpopulation  probit  slope,  the  extreme  case  for  conversion  process  is  not  at  p  =  px.  To 
find  the  combination  of  p  and  o  yielding  the  largest  difference  between  the 
subpopulation  median  and  the  population  median,  start  with  the  equation  for  the 
conversion  factor  (CF)  of  a  subpopulation  ED50  to  the  population  ED50: 

CF  =  antilog  [(0-p  )/om»ub]-  (Cl) 

Equation  (Cl)  can  be  rewritten  as  CF  =  antilog  [  (-1  /  /n8U b )  ( P  /o )  ],  from  which  it  is 
dear  that  the  CF  is  maximized  when  the  ratio  p  la  is  maximized.  To  proceed  from  (Cl), 
take  logarithms  of  both  sides  of  (Cl )  and  substitute  [  -2  (1  -  o 2 )  ln(  0  la )  ] 1/2  from 


equation  (4)  of  the  text  for  p: 

log(CF)  =  {0-[-2  (1  -  or2 )  ln(  0  /o )  ] (1/2)}  /  a  msub  (C2) 

Multiply  both  sides  by  m8U b  and  then  square  both  sides: 

[m8Ub  log(CF)]2  =  [  -2  (1  -  ct2  )  ln(  0  /a )  ]  /  a2  (C3) 

Moving  the  o2  of  the  denominator  into  the  numerator  and  rewriting  0  la  as  0  o" 1  yields 

[m8Ub  log(CF)]2  =  -2(cr"2-1 )  ln(0 a-1)  (04) 

Now  take  the  derivative  with  respect  to  o  and  set  the  derivative  equal  to  zero.  Setting 
the  derivative  to  zero  fixes  the  value  of  o,  so  it  is  denoted  or. 

-2  [ -2(or"3)  ln(0Or_1)  +  (or2 -1 )  (-0or"2/0or_1)  ]  =  0  (C5) 

Simplify  (-0ar"2/0ar_1)  to  (-or1)  and  multiply  both  sides  by  -or3/  2: 

—2  ln(0  Or-1)  +  Or*  (Or”2  —  1  )  (""Or" 1)  =0  (C6) 

Multiplying  out  o3  (or“2 - 1 )  (-or1)  reduces  (C6)  to 

-2ln(0/Or)  +  (Or2-1)  =  O  (C7) 


Separating  -2  ln(0/or)  to  -2  ln(0)  +  2  ln(or),  dividing  both  sides  by  2,  and  adding  ln(0) 
to  both  sides  produces 


ln(or)  +  (or2-1)/2  =  ln(0). 


(C8) 
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Subtracting  (ar2  -  1)/2  from  both  sides  of  (C8)  followed  by  exponentiation  of  both 
sides  yields 


Or  =  e  exp[(  1  -  Or2 )  / 2],  (C9) 

Equation  (C9)  was  solved  numerically  (by  iteration)  to  obtain  or.  The  starting 
value  for  or  was  ( 0  + 1  )/2.  Because  an  overestimate  of  or  [as  the  input  on  the  right 
side  of  (C9)  ]  yields  an  underestimate  [as  the  output  on  the  left  side  of  (C9)  ]  and  vice 
versa,  the  input  and  output  values  of  or  were  averaged  to  obtain  the  input  for  the  next 
iteration.  Iteration  continued  until  the  difference  between  the  input  value  and  the  output 
value  was  less  than  1  0“®.  In  Figure  2  of  the  text,  the  point  (or,  pr )  can  be  found  by 
drawing  a  line  from  the  origin  tangent  to  the  feasible  region;  the  line  will  have  slope 
=  Mr/ ar. 
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